Multiple social encounters can eliminate Crozier’s paradox and stabilise genetic kin recognition

Crozier’s paradox suggests that genetic kin recognition will not be evolutionarily stable. The problem is that more common tags (markers) are more likely to be recognised and helped. This causes common tags to increase in frequency, and hence eliminates the genetic variability that is required for genetic kin recognition. It has therefore been assumed that genetic kin recognition can only be stable if there is some other factor maintaining tag diversity, such as the advantage of rare alleles in host-parasite interactions. We show that allowing for multiple social encounters before each social interaction can eliminate Crozier’s paradox, because it allows individuals with rare tags to find others with the same tag. We also show that rare tags are better indicators of relatedness, and hence better at helping individuals avoid interactions with non-cooperative cheats. Consequently, genetic kin recognition provides an advantage to rare tags that maintains tag diversity, and stabilises itself.

This model extends that earlier work by changing how cooperative interactions are modelled before, and finds that Crozier's paradox no longer holds -this is the paper's main conclusion. Specifically, the model assumes that individuals can form a pair with one other individual, and potentially help each other, and that pair formation depends upon genetic tags. There are two types of individual (conditional helpers, and non-helpers), controlled by a genetic polymorphism, and there is a second locus controlling the recognition tag. Individuals form their pairs based on the tag, in an iterative process. If the first individual they meet has the same tag, a pair is formed. If not, they potentially continue to inspect other individuals, which happens with probability alpha, until they meet a sametag individual. If the focal individual has the helping allele and its partner has the same tag, it helps the other individual at a cost to itself, otherwise no helping takes place. When alpha tends to 1, raretagged individuals always manage to find a partner, no matter how rare their tag is, and under this high-alpha scenario, Crozier's paradox does not hold. This is because when rare-tagged individuals always manage to pair up (and there is no cost to searching them out), there is a net benefit to carrying a rare tag, because these tags are better indicators of relatedness and hence of altruism strategy.
One odd aspect is that the paper has supplementary material running to 75 pages, which contains many important details and results that are not discussed or even alluded to in the main paper. I read the main paper first, and had several questions that could be answered by reading the supplement. For example, I was confused why the authors chose to model a scenario where searching through 100s of individuals to find one with a matching, rare tag was assumed to have no fitness costs, relative to being able to easily find a common-tagged partner, because this seems to completely undermine the model. In the supplement, I see that there was indeed a parameter called C_search, which controls the costliness of this searching process, and indeed setting C_search > 0 makes it less likely that the model resolves Crozier's paradox. However, C_search is never mentioned and is barely alluded to in the main manuscript, and I assume that all the figures in the main manuscript were generated assuming that C_search = 0. I am puzzled why the authors do not discuss this work in the main paper, especially because C_search is a biologically reasonable addition to the model, and its value completely determines the conclusions. Instead, the paper simply notes the the results are "robust" to the assumption that there are costs of partner searching (citing the supplement), although this claim seems to be contradicted by Figure S6 in the supplement (which illustrates that C_search > 0 reduces the parameter space where genetic kin recognition and altruism can evolve).
Overall I think the paper would be greatly improved by properly describing all the work in the supplement in the main paper. I wonder what it's there for if it doesn't warrant a mention in the paper (e.g. the simple vs island model distinctions).
is not correct. I think you mean something like "could not be maintained by selection to avoid associating with non-cooperators", or similar. For example, the Holman et al. model considered a second source of diversifying selection acting on the tag locus (from sexual selection) and found (unsurprisingly) that it could counteract the Crozier effect if that second source of selection was strong enough -but it's still selection. What else maintains genetic diversity, if not selection? (mutation I guess, but that's not what you mean here).
Line 229: "worst case scenario" Conversely, you assume there are no costs to rare-tag individuals that have to check dozens of partners before finding a match, and have little or no risk of going unpaired because patches are large or infinite (I'm not sure which it is), which is a best-case scenario for them.
Line 238: "Second, our theory emphasises the need to measure the frequency with which individuals encounter other individuals with the same tag (Fig. 4d). When this frequency is high, corresponding to a high alpha…" This seems incorrect. In the methods, you describe alpha differently, as the probability an individual goes off to keep searching when it encounters a mis-matched partner. Also, your theory does not mean that researchers should go out and measure how often tag-matched individuals encounter one another, I think. What matters is the relative fitness effects of cooperation for individuals with common vs rare tags -if the rare-tagged individuals have higher socially-based inclusive fitness (due to avoiding pairing up with unhelpful cheaters, directing their help to other cooperators, etc) then it would demonstrate that the Crozier effect (from being unable to easily find and be found by same-tag individuals) is outweighed by the LD effect you mention.
Line 241: "Third, Equation 1…" But that equation is basically just Hamilton's rule, and so it's not a prediction that stems from your theory. I suggest removing this.
Line 245: Could also note that genetic kin recognition is more favoured when there is little cost to checking large numbers of individuals to find a rare match, and when it's actually possible to check many individuals. However as noted above, this is similar to saying that Crozier's paradox doesn't hold if the biological assumptions that led to its formulation are not true.
Reviewer #2: Remarks to the Author: This is a fantastic paper. It presents new theory on selection for tag-based altruism and a resolution for Crozier's paradox, a longstanding dilemma in social evolution theory. Crozier's paradox states that though genetic cues or tags are used to identify relatives, using them in this way is unstable, because common tags have higher fitness that rare ones. This comes about because common cues will match more often and get more kin altruism. Thus, theory tends to say that tag-based kin recognition should be absent or uncommon, but we have many empirical examples. The authors' theoretical innovation is very simple in principle; allow mismatched individuals (mostly rare tags) to break off and search further for a matching partner. This turns out to stabilize genetic kin recognition, thus solving an important problem. The paper is also very clearly written.
The mathematical theory is of course more daunting and most of it is relegated to a very long supplement. My initial reaction as a reviewer was that a 75-page supplement was going to be too much. But on reading it, I changed my mind. It is a superb exposition of the models, extremely clear in its goals, assumptions, methods, and conclusions. While not every reader will want to dig through the detailed models, this way of writing it up maximizes the audience that could understand it if they put in the effort. I only wish that all theoretical papers were written like this.
My only significant comment is that I'd like to see a little more clearly how different values of alpha affect outcomes. In Fig. 4a, having a very high alpha of 0.975 already means a significant portion of the area where kin disclination is favored does not support stable recognition. So what about lower values? Figure 4d gives us an idea about this, but for probability of interaction rather than alpha. Going back to 4c gives us the relationship between those two variables, and together c and d seems to confirm that for tag frequency near zero, you need very high alphas. So, does that mean, for lower alphas, that low tag frequencies (high tag numbers) won't be stable, but higher frequencies (lower tag numbers) will be? A little more discussion of this point would be helpful particularly since you consider only alpha=1 in the detailed model. Alpha is the parameter that makes your model work where other models did not, and yet it seems to be the most poorly explored parameter. I don't really like "networks" in the title. Networks are not the key to the solution of the problem and the word is hardly mentioned in the text. It will also lead readers into expecting a kind of model that this is not, with emphasis on nodes and edges etc. The real key to this model is partner searching or partner choice (though based on tags rather than past behavior). Shouldn't that be highlighted in the title.? line 75. Later on I found that I had forgotten the important distinction here between encounters and interactions. It might help to insert here something along the lines of the following piece from the supplement "individuals can engage in sufficiently many social encounters before committing to a given social interaction".
105. What does the assumption of minimal diversity mean? Truly minimal would be no diversity.
SUPPLEMENT p 1 bottom. "Partner recycling" seems a confusing term for this process. It implies a partner is going to be used again, which is not true here, but is true in standard partner-choice models where a partner is kept if it shows the right behavior. Consider alternative like "partner search", "tag search" or "tag choice". p 5 par 3. I have trouble figuring out how search costs work. Are you making assumptions here for mathematical convenience or to match biology? p 6 top. "density dependence is bland" seems on odd description p. 7 par 1. The two probabilities making up the cost are given in one order verbally and in the opposite order in the math which is unnecessarily confusing. eqns 1 and 2. I'm having trouble seeing that how the denominator of the fractions work. Is there a series argument here? Added later. I get this now after reading your later explanation of equation 25. So maybe some of that explanation needs to be moved up to here. But I would also more explicitly add, because this is what I was failing to see, that this same probability applies to every step of the search process -even as the numbers of new interactions decline, the probability stays the same. eqns 1 and 2. Doesn't your A term imply that the fitness components like b and c effectively vary over the generations as the number of altruistic interactions changes? Normally b and c are defined in terms of relative fitness and they go into the calculation of mean fitness, so mean fitness changes while b and c remain constant. Here it seems the reverse, mean fitness stays constant, implying that b and c change? I guess it boils down to I'm thinking about hard selection (of the global population) and you are modeling soft selection. It might be worth pointing out.
p 13, an aside regarding Levin & Grafen. You cite reference 42 for both the old and the newer way of calculating inclusive fitness. Should it be two different references? p 64 par 2. "and this will require a low cost of kin recognition (high α)" -is "high " what you mean here? p 64 par 4. "If the evolutionary constraint on tag diversity is relaxed completely (Lmax→∞), then genetic kin recognition may evolve (Equation 43 satisfied) across most of the parameter space (i.e. across most values of m, N and r)." Do you mean for most values of m, N, and r, for which inclusive fitness predicts cooperation? p 65 par 2. Is this agent-based model identical to your island model, other than being agent-based? p 65 par 5-6 I'm not seeing the connection between these two paragraphs. the first says that conditional altruism will fall to close to zero (but held above by mutation). This will maintain rare tags. The second paragraph has those rare tags and conditional altruism being favored, the latter to fixation. What specifically has changed to cause conditional altruism to switch from being selected against to being selected to fixation? Reviewer #3: Remarks to the Author: The manuscript is very interesting question, though I have a few comments.
1. Some researchers argue that there are no examples of genetic kin recognition, including one of the authors (Grafen 1990 argued that the only evidence comes from one species of marine invertebrates, Botryllus schlosseri), and yet this manuscript ignores this issue.
If there are good examples in the literature, then these should be addressed, but if genetic kin recognition does not exist or there are few or no good examples, then the point of such a model is unclear -and its importance is dubious.
2. The introduction (Crozier's Paradox) summarizes the theoretical models that have addressed this topic, but it does not accurately summarize the field or previous theoretical work.
Contrary to what the authors assert, it is not widely accepted that kin recognition via genetic cues is evolutionarily unstable (p.2, line 29), as papers cited and the authors themselves have helped to keep this hypothesis viable. So, this assertion is inaccurate, which can be easily shown by any review on this topic.
The authors state that previous models find that genetic kin recognition is only stable under 'very restrictive conditions' (p. 2, line 39; line 227). The model by Rouseset and Roze (2007), which is cited, found that kin recognition can be maintained given certain assumptions, i.e., spatial population structure (from low dispersal) and recombination between matching and helping loci, and negativefrequency dependent selection from parasites, but these are realistic, not unrealistically restrictive. It is unclear which aspects of this and other models are unrealistic or overly restrictive. Grafen (1990) proposed that the main problem with Crozier's Paradox is that it assumes that there are no cheats, but this issue is buried in the supplement.
3. The assumptions of the authors' model allow multiple encounters, which is expected with realistic spatial populations structure, which is in line with previous models.
4. The authors assert that their conclusions are robust to several It is crucial to show that tag diversity is not eroded by genetic drift (p 7, line 151), and this issue is too important for the supplements and should be addressed in the manuscript. Also, the paper should address how robust their results are against these assumptions.
Many other important issues are found only in the supplement.
5. The authors state that their model makes only conservative assumptions (p 13, line 230), but they do not explain why --or why they are more conservative than previous models.
6. The main question here is: 'can genetic recognition evolve with fewer assumptions of previous models?' The authors suggest that host-parasite interactions, the solution usually cited to solve Crozier's Paradox, 'may be a red herring' (p 13, line 237). If so, then the authors should clarify why their model assumptions are not only simpler but also more realistic, and acknowledge that their model is not a mutually exclusive alternative.
7. There are many terms that are not defined in the manuscript, such as 'kin recognition', 'pedigree relatedness', etc, and it is critical to define terms because they are used differently by different researchers. e.g. Grafen (1990) was criticized for generating semantic confusion over the term 'kin recognition'. The authors contrast genetic kin recognition with environmentally determined phenotypic tags, but then use an example of 'grew up in the same next' (p. 2, line 41), which is not a phenotypic label. There are reviews that have addressed these semantic issues, but they are not cited. Reviewer #1 (Remarks to the Author): The paper presents a model of genetic kin recognition and cooperation based on highly variable, inherited "tags" that allow individuals to pair up with same-tag individuals, who are more likely than average to be close relatives. Inclusive fitness theory predicts that costly altruism should evolve more easily when it is directed at relatives, and earlier work has examined the coevolution between these tags and alleles that encode altruistic and nonaltruistic behaviours. In 1986, Ross Crozier noted that individuals with rare recognition tags should receive less help than common-tag individuals, and might have a more difficult time locating social partners, giving them low fitness -this causes common tags to become more common, until the recognition cues become uniform and thus useless as cues of relatedness, creating what is now called Crozier's paradox.
This model extends that earlier work by changing how cooperative interactions are modelled before, and finds that Crozier's paradox no longer holds -this is the paper's main conclusion. Specifically, the model assumes that individuals can form a pair with one other individual, and potentially help each other, and that pair formation depends upon genetic tags. There are two types of individual (conditional helpers, and non-helpers), controlled by a genetic polymorphism, and there is a second locus controlling the recognition tag. Individuals form their pairs based on the tag, in an iterative process. If the first individual they meet has the same tag, a pair is formed. If not, they potentially continue to inspect other individuals, which happens with probability alpha, until they meet a same-tag individual. If the focal individual has the helping allele and its partner has the same tag, it helps the other individual at a cost to itself, otherwise no helping takes place. When alpha tends to 1, rare-tagged individuals always manage to find a partner, no matter how rare their tag is, and under this high-alpha scenario, Crozier's paradox does not hold. This is because when rare-tagged individuals always manage to pair up (and there is no cost to searching them out), there is a net benefit to carrying a rare tag, because these tags are better indicators of relatedness and hence of altruism strategy.
One odd aspect is that the paper has supplementary material running to 75 pages, which contains many important details and results that are not discussed or even alluded to in the main paper. I read the main paper first, and had several questions that could be answered by reading the supplement. For example, I was confused why the authors chose to model a scenario where searching through 100s of individuals to find one with a matching, rare tag was assumed to have no fitness costs, relative to being able to easily find a common-tagged partner, because this seems to completely undermine the model. In the supplement, I see that there was indeed a parameter called C_search, which controls the costliness of this searching process, and indeed setting C_search > 0 makes it less likely that the model resolves Crozier's paradox. However, C_search is never mentioned and is barely alluded to in the main manuscript, and I assume that all the figures in the main manuscript were generated assuming that C_search = 0. I am puzzled why the authors do not discuss this work in the main paper, especially because C_search is a biologically reasonable addition to the model, and its value completely determines the conclusions. Instead, the paper simply notes the the results are "robust" to the assumption that there are costs of partner searching (citing the supplement), although this claim seems to be contradicted by Figure S6 in the supplement (which illustrates that C_search > 0 reduces the parameter space where genetic kin recognition and altruism can evolve).

We fully take on board this concern, and have added an extensive analysis of partner search cost (c search ) to the island model (i.e. the model presented in the main text). The mathematical details are presented in the SI (pages 13-15, 31 & interspersed text), and the results have been added to the main text (under the "Encounter rate and search cost" heading on p11, & Fig. 5).
In particular, we explain what we meant by the claim that the results are robust to an increase in c search (though we no longer use the overly-strong word "robust"). We show that genetic kin recognition is less likely to be destabilised by an increase in the search cost (c search ) than by a decrease in the partner search rate (α). This is because the benefit of tag rarity -more cooperative social interactions -shines through more strongly when rare-tag individuals have more social opportunities. Increasing c search does not limit these social opportunities, whereas decreasing α does, which is why c search is less likely to destabilise genetic kin recognition.
Overall I think the paper would be greatly improved by properly describing all the work in the supplement in the main paper. I wonder what it's there for if it doesn't warrant a mention in the paper (e.g. the simple vs island model distinctions).
We have taken this on board, removing all non-essential information (e.g. the "simple model" analysis) from the SI, and making sure that all key results in the SI are also described in the main text (e.g. examination of c search ; a more detailed analysis of the encounter parameter, α; additional examination of the strength of balancing selection in finite populations).

Specific comments:
Line 57-59: "Previous theory has assumed…" This isn't true of the Holman et al. simulation model, which assumed that individuals interact with all N individuals within their own patch and potentially help (or receive help) from each of them depending on their recognition cues (sort of like alpha = 1 in your model, but with partnerships forming with all matching-tag individuals in the patch, instead of just one as in your model). So, it seems untrue that this is a 'first' for the present model. However your model is an advance over that one in the sense that you consider the possibility that individuals get to inspect with between 1 and all N members of their patch in their search for a single social partner, instead of all or nothing.
We see that our sentence was badly worded. We have changed it to read: "Previous theory has assumed that, when an individual encounters a partner with a different tag, the opportunity to socially interact is wasted". It is the ability to abandon tagmismatched partners in favour of new social encounters (partner search) that is the 'first' for our model -we hope this is clearer now.
Line 63: "If multiple encounters occur, then individuals with rare tags could find individuals with the same tag and receive as much help as individuals with common tags" But surely, the individuals with common tags would more easily find same-tag individuals, and find more of them, and thus probably still have higher fitness than rare-tag individuals under many realistic assumptions? Your model assumes that there is no cost to searching for partner with the same tag, and so you assume that individuals finding a partner on their first try have the same fitness as those that have to check all N individuals to find one. [update: you relax this assumption in the supplement, but never mention it in the main text -as I expected, adding search costs can restore Crozier's paradox if they are high enough]

See above reply regarding our new analyses of partner search cost (c search ).
Line 75: Is the group size N finite or infinite? If it's finite, what happens to individuals who are the only one with their tag within their patch -do they never associate with a partner? If this never happens because groups are so large, you may have removed one of the potential fitness costs of having a rare tag (i.e. higher risk of being unable to engage in social interactions).
We have two points in reply to this: 1) Regarding whether group size is infinite or finite, we constructed our mathematical model under the assumption that there is no stochastic variation in the genetic composition of demes. This means that our mathematical model is only accurate for the case where group size (N) is infinite (this eliminates stochastic variation), and / or when partner search (α) is zero (this renders stochastic variation unimportant for selection). In the case where α>0, the model becomes less accurate as deme size (N) is reduced. So the key point here is that, technically, deme size is assumed be finite (this is necessary because we vary deme size when presenting certain results), be we also recognize that there is a mathematical inaccuracy here. This mathematical inaccuracy is inevitable in order to construct an analytically tractable model. Importantly, we check via agent-based simulation that our results tend to hold when stochastic deme variation is accounted for, even when deme size (N) is low. This more nuanced description of the assumptions behind deme size (N) and deme genetic composition have been added to the SI (e.g. pages 4, 5, 31) and main text (lines 73-75, 155-157). 2) Your point about rare-tag individual sometimes entering demes without tagmates, and therefore being unable to find a tag-matched partner, irrespective of partner search, is a valid one -we completely agree. We have added a detailed account of this concern to the SI (page 31). Fortunately, this extra cost for rare tags does not tend to alter results, even for small deme size (N), as shown in our agent-based simulation analyses, which account for this (SI pages 31-34).
Line 89: Again this doesn't apply to the Holman et al model, which assumes that individuals check all others in their patch and socially interact with all of them that share the tag.
We see that we have introduced some semantic confusion here. It is true that, in Holman et al., "individuals check all others in their patch and socially interact with all of them that share the tag". Indeed, this is a standard assumption in models of tagbased helping, including Rousset and Roze (2007). However, these models are all still considering the α=0 case, because social interaction rate is fully determined by the frequency of the individual's tag. For instance, in these models, if an individual has a rare tag, such that only one other member of its group has the same tag, that individual will socially interact once that generation. Conversely, an individual with a more common tag, such that e.g. 10 other individuals in its group have the same tag, will socially interact 10 times that generation. This contrasts with the α=1 case, in which social interaction rate is independent of tag frequency. The α=1 case has not been permitted in any theoretical treatment prior to ours, as far as we are aware.
We recognise that this issue may have arisen because we said in places that our study, but not previous ones, allowed multiple social encounters. What we really mean is -our study, but not previous ones, allowed multiple social encounters before each social interaction (partner search). We have made sure to add in this "before each social interaction" qualification throughout the text.

Amended.
Line 198: "linkage disequilibrium favouring rare tags". It's not strictly the LD [with the cooperation locus] that provides a fitness advantage to the rare tags. Instead, individuals that have a rare tag are more likely to receive help from their social partner than individuals with common tags are (i.e. there is LD at the population level between the tag and helping loci), and this extra help to rare-tagged individuals increases their fitness. I know it's a semantic point but you might be able to word this bit more clearly.
We have changed the wording to "more cooperative social interactions (due to linkage disequilibrium) favouring rare tags." Line 209: "The probability of social interactions needs to drop significantly below 1.0 before genetic kin discrimination is likely to be less favoured" This is completely dependent on the specific parameters used here (e.g. the values of b and c and L_max), and also on the biology assumed in your model. For example, I think this run of the model assumes that an individual that has to check 100 partners before finding a matching tag one has the same fitness as individual who gets it right the first time. Relaxing that assumption would presumably change Figure 4d and other results. At present, the results could be read as "If we assume some of the main effects previously proposed to lead to Crozier's paradox are absent, such as greater search costs and more failures to find a partner for rare-tag individuals, then the paradox is resolved"; this doesn't invalidate your paper in my view, but it changes it from "we've solved the paradox" to "we've identified which biological assumptions lead to the paradox more clearly than before".
We have changed the wording to "the probability of social interactions often needs to drop significantly below 1.0" -i.e. we have added in the word "often", to indicate that these results hold for most biologically reasonable parameterisations of the model. Line 226: "Previous studies found that tag diversity could not be maintained by selection alone". This is not correct. I think you mean something like "could not be maintained by selection to avoid associating with non-cooperators", or similar. For example, the Holman et al. model considered a second source of diversifying selection acting on the tag locus (from sexual selection) and found (unsurprisingly) that it could counteract the Crozier effect if that second source of selection was strong enough -but it's still selection. What else maintains genetic diversity, if not selection? (mutation I guess, but that's not what you mean here).
We have changed the wording to "Previous studies found that, in general, tag diversity could not be maintained by selection on social behaviour alone" -i.e. we have added in "in general" and "on social behaviour". This statement, to the best of our knowledge, is accurate. We have also added more explanation of this statement to the SI (pages 38-39), which is referenced in the main text (lines 280-286).
In particular, Rousset & Roze (2007) found that 2 tags can be maintained, under very low recombination and mutation (restrictive conditions), by selection on social behaviour alone (i.e. without tag mutation), but no more than 2 tags can be maintained without tag mutation.
Furthermore, Holman et al. found that tag diversity can be maintained by extrinsic balancing selection, if tags have an additional role in mate choice. This is similar to Crozier's suggestion that tag diversity can be maintained if tags have an additional role in host-parasite interactions. We have added a detailed discussion of these theories to the SI (pages 38-39) and main text (lines 292-312), but these invoke selective pressures extrinsic to the evolution of social behaviour.
Line 229: "worst case scenario" Conversely, you assume there are no costs to rare-tag individuals that have to check dozens of partners before finding a match, and have little or no risk of going unpaired because patches are large or infinite (I'm not sure which it is), which is a best-case scenario for them.
We have changed the sentence to "Additionally, our theory modelled a relatively unfavourable scenario for genetic kin recognition, and so our finding that it can be stable may be conservative." We see that "worst case scenario" was overly strong language. However, the issues you raise -regarding partner search cost, and infinite deme size -are both accounted for, either in our mathematical model, or in our agentbased simulation.
Line 238: "Second, our theory emphasises the need to measure the frequency with which individuals encounter other individuals with the same tag (Fig. 4d). When this frequency is high, corresponding to a high alpha…" This seems incorrect. In the methods, you describe alpha differently, as the probability an individual goes off to keep searching when it encounters a mis-matched partner. Also, your theory does not mean that researchers should go out and measure how often tag-matched individuals encounter one another, I think. What matters is the relative fitness effects of cooperation for individuals with common vs rare tags -if the rare-tagged individuals have higher socially-based inclusive fitness (due to avoiding pairing up with unhelpful cheaters, directing their help to other cooperators, etc) then it would demonstrate that the Crozier effect (from being unable to easily find and be found by same-tag individuals) is outweighed by the LD effect you mention.
We have changed this passage to "Second, our theory emphasises the need to measure the frequency with which individuals encounter other individuals to allow them a reasonable chance of encountering one with the same tag (Fig. 4d). When this frequency is high, corresponding to a high α…" (lines 318-321). We see that our previous wording was not quite right -it is the rate with which individuals encounter each other (our α parameter) that should be measured in general (or a proxy for it), rather than the rate with which individuals with the same tag encounter each other.

With regards to your suggestion, one issue with it is that it seems to require knowing the population frequency of recognition tags (though you may have ideas for getting around this). This is likely to be empirically difficult, particularly if the genetic basis of genetic kin recognition is unknown. We agree though that examining the inclusive fitness consequences of conditional versus unconditional (or less conditional) behaviour is important, and indeed this is a key result of our analysis, encapsulated by our Hamilton's Rule condition (Equation 1
). However, this is not the whole story, because genetic kin recognition will often not be stable if partner search is limited (low α) or costly (high c search ), even if conditional altruism is a favourable strategy. This is why measuring α in natural populations -or a proxy for it, such as time spent searching for a social partner or aggregating into a multicellular group, etc -is also important.
Line 241: "Third, Equation 1…" But that equation is basically just Hamilton's rule, and so it's not a prediction that stems from your theory. I suggest removing this.

The reviewer is correct to note that Equation 1 is very similar in form to the versions of Hamilton's rule derived by Taylor & Queller amongst others. In these formulations, we have Rb -c -R'(b-c) > 0, where R is the relatedness between social partners, and R' is the relatedness between competitors. Our version is similar to this. However, the novelty in our formulation is that R & R' are derived as functions of tag frequency (with R being more strongly affected by a change in tag frequency than R'). Our derivation of this modified Hamilton's Rule gives a necessary condition for kin discrimination based on genetic cues to evolve. This link between Hamilton's rule and genetic kin recognition is important and, we think, worth pointing out. It is explicitly derived from our population genetic model in the SI (Section 3b), and is a novel prediction, so we disagree with the statement that "it's not a prediction that stems from your theory". We have added "(derived in Supp. 3b)" to the main text when introducing Equation 1, to emphasise that it is a prediction that stems from our theory.
Line 245: Could also note that genetic kin recognition is more favoured when there is little cost to checking large numbers of individuals to find a rare match, and when it's actually possible to check many individuals. However as noted above, this is similar to saying that Crozier's paradox doesn't hold if the biological assumptions that led to its formulation are not true.
We have changed the sentence to: "Genetic cues could be more likely to be favoured when there is greater opportunity for multiple low-cost social encounters (higher α & lower c search ), for instance, when social groups are more compact (dense social networks)." i.e. we have added in "low-cost" and "& lower c search ".

Reviewer #2 (Remarks to the Author):
This is a fantastic paper. It presents new theory on selection for tag-based altruism and a resolution for Crozier's paradox, a longstanding dilemma in social evolution theory. Crozier's paradox states that though genetic cues or tags are used to identify relatives, using them in this way is unstable, because common tags have higher fitness that rare ones. This comes about because common cues will match more often and get more kin altruism. Thus, theory tends to say that tag-based kin recognition should be absent or uncommon, but we have many empirical examples. The authors' theoretical innovation is very simple in principle; allow mismatched individuals (mostly rare tags) to break off and search further for a matching partner. This turns out to stabilize genetic kin recognition, thus solving an important problem. The paper is also very clearly written.
The mathematical theory is of course more daunting and most of it is relegated to a very long supplement. My initial reaction as a reviewer was that a 75-page supplement was going to be too much. But on reading it, I changed my mind. It is a superb exposition of the models, extremely clear in its goals, assumptions, methods, and conclusions. While not every reader will want to dig through the detailed models, this way of writing it up maximizes the audience that could understand it if they put in the effort. I only wish that all theoretical papers were written like this.

Thank you very much for your kind words.
My only significant comment is that I'd like to see a little more clearly how different values of alpha affect outcomes. In Fig. 4a, having a very high alpha of 0.975 already means a significant portion of the area where kin disclination is favored does not support stable recognition. So what about lower values? Figure 4d gives us an idea about this, but for probability of interaction rather than alpha. Going back to 4c gives us the relationship between those two variables, and together c and d seems to confirm that for tag frequency near zero, you need very high alphas. So, does that mean, for lower alphas, that low tag frequencies (high tag numbers) won't be stable, but higher frequencies (lower tag numbers) will be? A little more discussion of this point would be helpful particularly since you consider only alpha=1 in the detailed model. Alpha is the parameter that makes your model work where other models did not, and yet it seems to be the most poorly explored parameter.

We see that we went through the analysis of the encounter parameter (α) too quickly, so we have added more explanation and new results to the main text (lines 203-232).
Your suspicion that, as α is reduced from 1, the stability of genetic kin recognition quickly falls off, is correct. We have added in this specific plot (Fig. 4c), and we now explicitly state this result in the text (lines 203-204). However, this does not mean that genetic kin recognition is only likely to be stable in extreme cases, and we list two reasons for this.
The first reason is that we need to think about what our mathematical parameter α represents biologically. One way to think about it is that it gives a proxy for what the likelihood is of socially interacting, per opportunity to socially interact. We plotted this relationship between α and the probability of socially interacting, for different tag frequencies. This shows that the probability of socially interacting falls off as α decreases, and this fall-off is steeper for individuals that are using rarer tags.
We then focus on the relationship between α and the social interaction probability that is obtained when individuals are using a limitingly rare tag (i.e. the line labelled "~0" in Fig. 4d). We can then ask, how does the stability of genetic kin recognition vary as the probability of social interaction, for an individual using a limitingly rare tag, varies. This is what we plot in Fig. 4e, and this is the key result, which shows that, on a biological interpretation of our mathematical parameter α, kin discrimination based on genetic cues is likely to be stable, unless the opportunity to socially interact when an individual has a rare tag is heavily diminished. We have gone through these steps more slowly now, which should hopefully make things a bit clearer.
You also ask, "So, does that mean, for lower alphas, that low tag frequencies (high tag numbers) won't be stable, but higher frequencies (lower tag numbers) will be?" This is sometimes the case. But more often, for lower alphas, one single tag runs all the way to fixation, meaning no tags are maintained.
The second reason why it is not the case that genetic kin recognition is only likely to be stable in extreme cases, is that the results of our mathematical model are based on the case where selection on social behaviour (magnitude of b and c) is weak. When the strength of selection is increased, it becomes more likely that genetic kin recognition can be stable for lower values of α. We have added discussion of this point to lines 222-232, and a figure (Fig. 4f).
I don't really like "networks" in the title. Networks are not the key to the solution of the problem and the word is hardly mentioned in the text. It will also lead readers into expecting a kind of model that this is not, with emphasis on nodes and edges etc. The real key to this model is partner searching or partner choice (though based on tags rather than past behavior). Shouldn't that be highlighted in the title.?
We have taken your advice and changed the title to: "Multiple social encounters can eliminate Crozier's paradox and stabilise genetic kin recognition". line 75. Later on I found that I had forgotten the important distinction here between encounters and interactions. It might help to insert here something along the lines of the following piece from the supplement "individuals can engage in sufficiently many social encounters before committing to a given social interaction".
We have inserted "Individuals can potentially have many social encounters before committing to a given social interaction." 105. What does the assumption of minimal diversity mean? Truly minimal would be no diversity. This is true. Specifically, we assumed that one tag had an initial frequency of 0.9, and all remaining tags had a frequency of approximately 0.1 / (L max -1). This is explained in the SI, but in the main text we have changed "minimal" to "low" tag diversity.
SUPPLEMENT p 1 bottom. "Partner recycling" seems a confusing term for this process. It implies a partner is going to be used again, which is not true here, but is true in standard partner-choice models where a partner is kept if it shows the right behavior. Consider alternative like "partner search", "tag search" or "tag choice".
We have removed all mention of "partner recycling" from the main text and supplementary information, and have used alternatives like "partner search", as you suggest.
p 5 par 3. I have trouble figuring out how search costs work. Are you making assumptions here for mathematical convenience or to match biology?
Our previous generational search cost function was an approximation, taken for mathematical convenience. However, we have opted to change this function, so that now, c search straightforwardly refers to the cost of abandoning one social partner and reassociating for a new social encounter. This results in a more complicated generational search cost function: p 6 top. "density dependence is bland" seems on odd description We have removed this description, which was superfluous anyway.
p. 7 par 1. The two probabilities making up the cost are given in one order verbally and in the opposite order in the math which is unnecessarily confusing.
We have removed the "simple model" from the SI, which means that the text / maths you are referring to no longer features.
eqns 1 and 2. I'm having trouble seeing that how the denominator of the fractions work. Is there a series argument here? Added later. I get this now after reading your later explanation of equation 25. So maybe some of that explanation needs to be moved up to here. But I would also more explicitly add, because this is what I was failing to see, that this same probability applies to every step of the search process -even as the numbers of new interactions decline, the probability stays the same.
We have added the text to the SI (page 11), which hopefully clarifies: "Because newly encountered partners are chosen with replacement of individuals that were previously encountered during the social search, the probability of abandoning a given partner is the same, no matter how many encounters the focal individual has already had that generation." The series argument is also now made explicit in the derivation of the generational partner search cost in the SI (page 13).
eqns 1 and 2. Doesn't your A term imply that the fitness components like b and c effectively vary over the generations as the number of altruistic interactions changes? Normally b and c are defined in terms of relative fitness and they go into the calculation of mean fitness, so mean fitness changes while b and c remain constant. Here it seems the reverse, mean fitness stays constant, implying that b and c change? I guess it boils down to I'm thinking about hard selection (of the global population) and you are modeling soft selection. It might be worth pointing out.
In our model, b & c do indeed stay fixed, as is standard in social evolution models, as you rightly say. Mean fitness stays constant at 1, as is standard in population genetic models where population size is constant. The two things are consistent. Specifically, any increase in mean fitness caused by cooperative interactions is exactly countered by increased competition. In other words, the "A term" captures the effect of competition on fitness. The "A term" is a function of, amongst other things, the number of social interactions that have taken place across the population that generation. The A term accordingly varies each generation such that mean fitness remains at 1 (in other words, the A term is not a constant). We hope it is now clear how it is both true that b & c are constant and mean fitness is 1. We note also that, in our model, because population size is constant, absolute fitness (defined as the number of offspring that survive through one iteration of the lifecycle) and relative fitness (defined as absolute fitness divided by mean absolute fitness) are identical quantities. Our approach is therefore fully consistent with social evolution approaches where "b and c are defined in terms of relative fitness". We have added some extra explanation of these issues to the SI (bottom of page 8).
p 13, an aside regarding Levin & Grafen. You cite reference 42 for both the old and the newer way of calculating inclusive fitness. Should it be two different references?
Yes, there are two references -the old way of calculating inclusive fitness is the original Hamilton (1964) calculation of inclusive fitness. This Hamilton (1964) reference was already there, but it was written inline, so you may have missed it! We have moved the reference to the end of the sentence so it stands out more. We have removed the "simple model" from the SI, which means that the figure you are referring to no longer features. eqn 17 and below. So, your number of tags is not the countable number of tags but a sort of effective tag number based on frequencies?
Yes, that is right. We have added the following text to the SI (bottom of page 28): "We note that this metric is not the countable number of tags -such a measure would be misleading, because it would give equal weight to tags that are limitingly rare and exceedingly common. Rather, this metric is an effective tag number based on tag frequencies." p 33 par 6 add a reference to Fig S3 when you mention being stretched on the y-axis?
We have removed the "simple model" from the SI, which means that the text you are referring to no longer features. No, we mean most of the total parameter space -i.e. most combinations of m, N and r. This is because, if infinitely many tags are (hypothetically) segregating, then individuals will be able to pick out kin with no mistakes (highly precise kin recognition). In this scenario, conditional helping is nearly always favoured over defection, as long as b>c.
p 65 par 2. Is this agent-based model identical to your island model, other than being agentbased?
The differences are: (1) The mathematical model assumes weak selection (low magnitude of c, b, c search ), whereas the agent-based model permits stronger selection. (2) The mathematical model assumes that there is no stochastic variation in the genetic composition of demes, which is a reasonable assumption if deme size (N) is large. The agent-based model accounts for this stochastic variation, so is accurate for any deme size (N). We have emphasised these differences in SI (pages 31-34).
p 65 par 5-6 I'm not seeing the connection between these two paragraphs. the first says that conditional altruism will fall to close to zero (but held above by mutation). This will maintain rare tags. The second paragraph has those rare tags and conditional altruism being favored, the latter to fixation. What specifically has changed to cause conditional altruism to switch from being selected against to being selected to fixation? (N.B. We assume you were referring to p68 rather than p65.) In the absence of trait mutation, the conditional helping allele my go to a population frequency of zero. However, in the presence of trait mutation, the conditional helping allele will not go to zero; rather, it will go to a mutation-selection balance equilibrium. Trait mutation therefore ensures that there is always trait variation in the population. And as long as there is trait variation, rare tags can gain an advantage over common tags, because they will be in linkage disequilibrium with the conditional helping allele. This is why, in some cases, tag diversity is lost when there is no trait mutation, but sustained when there is trait mutation. The difference is that, when there is no trait mutation, trait variation may be completely lost (i.e. all individuals become defectors), which removes any benefit of tag rarity, preventing tags from equalising in frequency. However, when there is trait mutation, trait variation is never lost, meaning rare tags can always gain an advantage (more cooperative interactions) -if this advantage outweighs the costs of tag rarity (reduced interaction rate or costlier partner search), tags will begin to equalise in frequency. Eventually, tag equalisation will bring the previously-common tags to a low enough frequency that altruism is selected, even amongst individuals bearing these tags. At this point, altruism is universally favoured, resulting in high conditional altruism alongside stable tag diversity.
So to directly answer your question "what specifically has changed to cause conditional altruism to switch from being selected against to being selected to fixation?", the relative frequencies of different tags has changed. In the no-traitmutation scenario, the common tag never gets pulled to a low enough frequency that cooperation is favoured amongst individuals bearing this tag. In the trait mutation scenario, long term trait variation gives rare tags a long term advantage, pulling the common tag to a lower and lower frequency until altruism is universally favoured.
We hope that this clarifies things -we can appreciate that the effect of trait mutation is very subtle. We have added in some more explanation of this process to the SI (page 37), to hopefully make it clearer.
The manuscript is very interesting question, though I have a few comments.
1. Some researchers argue that there are no examples of genetic kin recognition, including one of the authors (Grafen 1990 argued that the only evidence comes from one species of marine invertebrates, Botryllus schlosseri), and yet this manuscript ignores this issue.
If there are good examples in the literature, then these should be addressed, but if genetic kin recognition does not exist or there are few or no good examples, then the point of such a model is unclear -and its importance is dubious.
Since Grafen's 1990 review of genetic kin recognition, many instances of genetic kin recognition have been discovered, and these are cited in the main text (lines 45-46).
2. The introduction (Crozier's Paradox) summarizes the theoretical models that have addressed this topic, but it does not accurately summarize the field or previous theoretical work.
Contrary to what the authors assert, it is not widely accepted that kin recognition via genetic cues is evolutionarily unstable (p.2, line 29), as papers cited and the authors themselves have helped to keep this hypothesis viable. So, this assertion is inaccurate, which can be easily shown by any review on this topic.
The authors state that previous models find that genetic kin recognition is only stable under 'very restrictive conditions' (p. 2, line 39; line 227). The model by Rouseset and Roze (2007), which is cited, found that kin recognition can be maintained given certain assumptions, i.e., spatial population structure (from low dispersal) and recombination between matching and helping loci, and negative-frequency dependent selection from parasites, but these are realistic, not unrealistically restrictive. It is unclear which aspects of this and other models are unrealistic or overly restrictive. (2007) found that, under conditions of low migration (m<0.1), tight linkage (r<0.1), and strong selection (high magnitude of b and c), two recognition alleles (tags) can be maintained stably alongside low / intermediate frequency of a conditional cooperation allele. For any more than two tags to be maintained, there also needs to be tag mutation (i.e. tag diversity of more than 2 tags cannot be maintained by selection on social behaviour alone).

Rousset & Roze
These requirements, in our view, are indeed restrictive, and this perspective was also conveyed in Rousset & Roze (2007) themselves, who say that: "in some cases, the population evolves toward an equilibrium where both the helping and matching locus are polymorphic. However, this only occurs under rather restrictive conditions"; "a stable polymorphism seems possible only under a restricted set of parameters and initial conditions. In most biological settings, nonhelper mutants at additional unlinked (r = 0.5) loci would be expected to destabilize any preexisting polymorphism not maintained by extrinsic forces"; "selection alone appears generally insufficient to maintain polymorphism at the recognition locus [although a low tag mutation rate may stabilise it]".
Of course, it is true that Rousset & Roze (2007) found that genetic kin recognition can be stabilised if there exists extrinsic balancing selection due to an additional role of recognition alleles in e.g. parasite resistance. We have therefore changed the sentence "It has become widely accepted that kin recognition via genetic cues is not usually evolutionarily stable", and added ", except when genetic cues are maintained for some reason unrelated to social behaviour." We have also changed "very restrictive" to "restrictive", which is how Rousset & Roze (2007) described their own results.
More generally, we have described in detail why previous demonstrations of the evolution of kin discrimination based on genetic cues are unrealistic / restrictive, both in the SI (pages 38-39) and main text (under the "Alternative scenarios and genetic architecture" heading on p16). Grafen (1990) proposed that the main problem with Crozier's Paradox is that it assumes that there are no cheats, but this issue is buried in the supplement.
We have added "Crozier's original statement of the paradox did not permit cooperative cheats, meaning this coevolution between cooperation and kin recognition could not be captured" to the main text (lines 148-150).
3. The assumptions of the authors' model allow multiple encounters, which is expected with realistic spatial populations structure, which is in line with previous models.
Previous models have indeed permitted multiple encounters, but in all previous models, partners cannot be abandoned in favour of new encounters. In all previous models, therefore, social interaction rate is determined by the frequency of an individual's tag (our α=0 case). Our model is novel because it permits multiple social encounters before each social interaction (partner search), meaning social interaction rate is not necessarily fully determined by the frequency of an individual's tag. We see that we have not always included the "before each social interaction" qualification when talking about multiple social encounters. We have now amended this in the text.
4. The authors assert that their conclusions are robust to several It is crucial to show that tag diversity is not eroded by genetic drift (p 7, line 151), and this issue is too important for the supplements and should be addressed in the manuscript. Also, the paper should address how robust their results are against these assumptions.
We had already shown, using an agent-based finite-population model, that tag diversity can be maintained in a finite population, implying that balancing selection on tags is overriding genetic drift (SI pages 31-32). However, to avoid any doubt that balancing selection is in operation and overrides drift, we have extended this analysis, adding the new section "Balancing selection in the finite population model" to the SI (pages 32-34).
In this new section, we run a version of our agent-based simulation, in which there is no tag mutation, for different parameter combinations. Tag diversity is eventually lost due to drift (this is inevitable in a finite population, if the simulation is run for long enough, even under balancing selection). We record this time taken for tag diversity to be lost (fixation time), and divide it by the time taken for tag diversity to be lost in the corresponding neutral scenario (i.e. where coefficients of selection are set to zero). We show that, in certain regions of parameter space, this ratio of fixation times is positive (i.e. fixation time is greater when there is selection) and increases with population size. As we explain in the SI, these properties provide strong evidence that balancing selection is in operation, and overrides genetic drift, allowing tag diversity to be maintained. We also demonstrate, using an analogous approach, that balancing selection is more likely to override drift, sustaining tag diversity, when: partner search (α) is higher, partner search is less costly (c search ), and the strength of selection on social behaviour (b,c) is increased (this last result recovers an important finding of Rousset & Roze (2007)).
The methodological approach that we used to show that balancing selection on tags overrides genetic drift closely follows one utilised by Rousset & Roze (2007). The methodology and results are elaborated in more detail in the SI (pages 32-34), and the results are also conveyed in the main text (lines 229-231 & Fig. 4f).
5. The authors state that their model makes only conservative assumptions (p 13, line 230), but they do not explain why --or why they are more conservative than previous models.
We do explain why -there is a reference in the main text to a section of the SI, in which we discuss these assumptions in detail -indeed, in too much detail for inclusion in the main text.
6. The main question here is: 'can genetic recognition evolve with fewer assumptions of previous models?' The authors suggest that host-parasite interactions, the solution usually cited to solve Crozier's Paradox, 'may be a red herring' (p 13, line 237). If so, then the authors should clarify why their model assumptions are not only simpler but also more realistic, and acknowledge that their model is not a mutually exclusive alternative.
We have added a detailed discussion of this to both the main text (lines 292-312) and SI (pages 38-39). In particular, we argue that host-parasite interactions is an incomplete theory, because it gives no account of why a locus under extrinsic balancing selection should be "chosen" as the kin recognition locus. In this sense, it may not be a mutually exclusive alternative to the solution presented here, but more theoretical work is needed to evaluate this properly. This is in addition to the discussion in the SI (page 38) and main text (lines 280-291) that explains why our model assumptions are simpler and more realistic than competing hypotheses that don't rely on extrinsic balancing selection (i.e. our solution is the only one that works: (i) under weak selection; (ii) without tag mutation; (iii) for migration and recombination values that aren't unrealistically small).
7. There are many terms that are not defined in the manuscript, such as 'kin recognition', 'pedigree relatedness', etc, and it is critical to define terms because they are used differently by different researchers. e.g. Grafen (1990) was criticized for generating semantic confusion over the term 'kin recognition'. The authors contrast genetic kin recognition with environmentally determined phenotypic tags, but then use an example of 'grew up in the same next' (p. 2, line 41), which is not a phenotypic label. There are reviews that have addressed these semantic issues, but they are not cited.
We have added the text "Individuals are therefore expected to evolve kin discrimination, which is the conditional helping of relatives that are identified (kin recognition) through either genetic or environmental cues" to the introduction of the main text.
We now introduce pedigree relatedness in the following context (lines 142-143), to emphasise that we just mean an individual's common ancestry: "As tags become more common, they will become less useful cues of the individual's common ancestry (pedigree relatedness; Supp. Info. 3e iii)". We have also added the reference to Supp.